Text Relatedness Based on a Word Thesaurus

نویسندگان

  • George Tsatsaronis
  • Iraklis Varlamis
  • Michalis Vazirgiannis
چکیده

The computation of relatedness between two fragments of text in an automated manner requires taking into account a wide range of factors pertaining to the meaning the two fragments convey, and the pairwise relations between their words. Without doubt, a measure of relatedness between text segments must take into account both the lexical and the semantic relatedness between words. Such a measure that captures well both aspects of text relatedness may help in many tasks, such as text retrieval, classification and clustering. In this paper we present a new approach for measuring the semantic relatedness between words based on their implicit semantic links. The approach exploits only a word thesaurus in order to devise implicit semantic links between words. Based on this approach, we introduce Omiotis, a new measure of semantic relatedness between texts which capitalizes on the word-to-word semantic relatedness measure (SR) and extends it to measure the relatedness between texts. We gradually validate our method: we first evaluate the performance of the semantic relatedness measure between individual words, covering word-to-word similarity and relatedness, synonym identification and word analogy; then, we proceed with evaluating the performance of our method in measuring text-to-text semantic relatedness in two tasks, namely sentence-to-sentence similarity and paraphrase recognition. Experimental evaluation shows that the proposed method outperforms every lexicon-based method of semantic relatedness in the selected tasks and the used data sets, and competes well against corpus-based and hybrid approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Omiotis: A Thesaurus-Based Measure of Text Relatedness

In this paper we present a new approach for measuring the relatedness between text segments, based on implicit semantic links between their words, as offered by a word thesaurus, namely WordNet. The approach does not require any type of training, since it exploits only WordNet to devise the implicit semantic links between text words. The paper presents a prototype on-line demo of the measure, t...

متن کامل

Random Walk on WordNet to Measure Lexical Semantic Relatedness

The need to determine semantic relatedness or its inverse, semantic distance, between two lexically expressed concepts is a problem that pervades much of natural language processing such as document summarization, information extraction and retrieval, word sense disambiguation and the automatic correction of word errors in text. Standard ways of measuring similarity between two words on a thesa...

متن کامل

Can Network Embedding of Distributional Thesaurus be Combined with Word Vectors for Better Representation?

Distributed representations of words learned from text have proved to be successful in various natural language processing tasks in recent times. While some methods represent words as vectors computed from text using predictive model (Word2vec) or dense count based model (GloVe), others attempt to represent these in a distributional thesaurus network structure where the neighborhood of a word i...

متن کامل

Fast Semantic Relatedness: WordNet: : Similarity vs Roget's Thesaurus

A Measure of Semantic Relatedness (MSR) automatically determines how close two words are in meaning. MSRs are used in such Natural Language Processing (NLP) problems as word-sense disambiguation or text summarization. To solve such problems may require millions of relatedness scores, but MSR run-time, clearly a major concern, has rarely been considered in NLP research. To evaluate an MSR, one o...

متن کامل

Analysis of Polysemy and Homographs of the Word "lead" in Roget's International Thesaurus, 3rd Edition

This paper follows from previous research on the relatedness or un-relatedness, in Roget's International Thesaurus (RIT), of entries which have identical spellings. A comparison is made between the output from research on a VAX 11/780 computer-accessible version of RIT, the RIT text, and a micro-computer database of the hierarchical and cross-reference information of RIT.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Artif. Intell. Res.

دوره 37  شماره 

صفحات  -

تاریخ انتشار 2010